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Abstract 

The next generation of CMB experiments can measure cosmological 
parameters with unprecedented accuracy — in principle. To achieve 
this in practice when faced with such gigantic data sets, elaborate data 
analysis methods are needed to make it computationally feasible. An 
important step in the data pipeline is to make a map, which typically 
reduces the size of the data set by orders of magnitude. We compare 
ten map-making methods, and find that for the Gaussian case, both 
the method used by the COBE DMR team and various forms of Wiener 
filtering are optimal in the sense that the map retains all cosmological 
information that was present in the time-ordered data (TOD). Specif- 
ically, one obtains just as small error bars on cosmological parameters 
when estimating them from the map as one could have obtained by es- 
timating them directly from the TOD. The method of simply averaging 
the observations of each pixel (for total-power detectors), on the con- 
trary, is found to generally destroy information, as does the maximum 
entropy method and most other non-linear map-making techniques. 

Since it is also numerically feasible, the COBE method is the nat- 
ural choice for large data sets. Other lossless (e.g. Wiener-filtered) 
maps can then be computed directly from the COBE method map. 
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1 INTRODUCTION 



A large number of Cosmic Microwave Background (CMB) experiments that 
cover extended patches of sky are currently in the phases of planning, design 
or data analysis, and they all have as partial goals to produce temperature 
maps. Since there are a plethora of map-making methods available, many 
experimental groups arc currently debating which one(s) to use when re- 
ducing their raw data. Indeed, it is indicative that the maps based on 
COBE (Smoot et al. 1992; Bennett et al. 1996), MAX (White & Bunn 
1995), Saskatoon (Tegmark et al. 1996) and Tenerife were made using four 
different methods, two linear and two non-linear. It is therefore quite timely 
to compare the various methods and assess their relative merits. The pur- 
pose of this Letter is to provide such a comparison. Note that we use the 
term "map-making" to refer to the data reduction process — for a discussion 
of important options in the data acquisition process such as scanning and 
chopping strategy, see e.g. Knox (1996) and Wright (1996). 

Which map-making method is preferable clearly depends on what the 
map is to be used for. Common uses for CMB maps (apart from satisfying 
a general desire to map the sky in as many frequency bands as possible) are 

• to facilitate comparison with other experiments. 

• to facilitate comparison with foreground templates such as the DIRBE 
maps. 

• to reveal flaws in the model that are not visible in the power spec- 
trum, such as non-Gaussian CMB features, point sources and spatially 
distinctive systematic problems. 

As CMB experiments collect larger and larger data sets, yet another use 
for map-making has emerged: as a data-compression step that makes it 
computationally feasible to constrain cosmological parameters. To a good 
approximation (Tegmark, Taylor &; Heavens 1997), one obtains the smallest 
possible error bars on estimates of cosmological parameters (such as A, 
etc.) by performing a likelihood analysis using the entire data set. So far, the 
small-scale experiments have all produced n -C 10^ data points, which means 
that it has been feasible to carry out such a "brute force" analysis. Assuming 
Gaussianity in the distribution of pixel temperatures and instrument noise, 
this entails computing determinants of n x n matrices at a grid of points 
in parameter space, and the time this takes scales as n^. For the time- 
ordered data (TOD) of COBE (with n ~ 2 x 10^), such brute-force analysis 
is completely unfeasible at present, not to mention the even larger data 
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sets of the upcoming MAP and COBRAS/SAMBA satellites. Map-making 
offers a useful way to reduce the data set down to a more manageable size, 
for instance down to 6144 numbers in the case of COBE and 10^ — 10^ for 
future satellite missions. The parameters can then be estimated from the 
maps with the brute force approach (Tegmark &; Bunn 1995; Hinshaw et al. 
1996) or with some faster and more elaborate scheme. This is schematically 
illustrated in Figure 1. This purely pragmatic approach to map- making, 
as a mere time-saving device, offers an objective quantitative way to rank 
map-making methods: one method is better than another if it retains more 
of the cosmological information, which operationally means that it will lead 
to smaller error bars on the parameter estimates. 

The rest of this Letter is organized as follows. We describe ten map- 
making methods in Section 2, compare them according to this criterion in 
Section 3 and summarize our conclusions in Section 4. 

2 A LIST OF METHODS 

2.1 The mapping problem 

Suppose we have measured n numbers yi, ...,yn, which we will refer to as 
the raw data or the time-ordered data (TOD), and wish to use this TOD 
to estimate a set of m numbers xi, ■.■,Xm which we will refer to as a map. 
Typically, our map would be pixelized and Xi would denote the temperature 
in pixel i. We will limit our treatment to the case where the time-ordered 
data (TOD) depends linearly on the map. Grouping the TOD and the map 
into an n-dimensional vector y and an m-dimensional vector x, respectively, 
this means that we can write 

y = Ax + n (1) 

for some known matrix A and some random noise vector n. Despite the 
linearity limitation, this formalism is still very general. The numbers in 
the vector x need not be restricted to CMB temperatures in various pixels, 
but can also include any other unknown parameters upon which the TOD 
depends linearly. For instance, the COBE analysis included a fit for three 
magnetic susceptibility coefficients (Wright et al. 1996), and in many cases, 
it may also be convenient to include various calibration-related parameters in 
X. To remove foregrounds, the TOD vector y can be expanded to include the 
temperatures measured at several different frequencies. In this case, x would 
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No. 


Method 


Specification 


1 


Generalized COBE 


W = [A*MA]-^A*M 


2 


Bin averaging 


W= [A*A]-iA* 


3 


COBE 


W = [A*N-iA]-iA*N-i 


4 


Wiener 1 


W = SA*[ASA* + N]-i 

W = [S-^ + A*N-iA]-i A*N-i 


5 


Wiener 2 


6 


Saskatoon 


W = [r/S-i + A*N-iA]-iA*N-i 


7 


TE96 


W = ASA*[ASA* + N]-S (WA)ii = l 


8 


TE97 


W = A[??S-^ + A*N-iA]-iA*N-i, (WA)^^ = 1 


9 


Maximum probability 


Nonlinear method if non-Gaussian 


10 


Maximum entropy 


Nonlinear method 



Table 1: Map-making methods 



be augmented to include the brightness of various foreground components 
in each pixel, and the matrix A would encompass the assumptions made 
about their frequency dependence. 

Without loss of generality, we can take the noise vector to have zero 
mean, i.e., (n) = 0, so the noise covariance matrix is 

N = (nn*). (2) 

In some of the methods described below (methods 4-9), the following prior 
assumptions are made about the map: it is assumed to be a realization of 
random vector with zero mean, i.e., (x) = 0, with some known covariance 
matrix 

S = (XX*) (3) 
and uncorrelated with the noise, i.e., (nx*) = 0. 

2.2 Ten mapping methods 

We will now summarize some map-making methods that have recently been 
used or advocated in the CMB context. All linear methods can clearly be 
written in the form 

x = Wy, (4) 

where x denotes the estimate of the map x and W is some m x n matrix 
that specifics the method. Table 1 shows the choices of W that define the 
linear methods we will discuss. 
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Method 1 has the attractive property that WA = I, which means that 
the reconstruction error e, defined as 

£ = i - X = [WA - I]x + Wn (5) 

becomes independent of x. In other words, the recovered map x is simply 
the true map x plus some noise that is independent of the signal one is 
trying to measure. Here M is an arbitrary n x n matrix. 

Method 2 is the special case of method 1 for which M = I. It can 
be derived by minimizing [y — Ax|, the mismatch between the observed 
and expected data sets (Dodelson 1996). If the data set consists of "total 
power" (undifferenced) observations of the sky, then the i^^ row of A will 
vanish except for a 1 in the column corresponding to the pixel observed 
at time step i, and it is easy to see that Method 2 corresponds to simply 
averaging the measurements of each pixel. As we will see, this is an inferior 
method when noise correlations (due to for instance 1 //-noise) are present. 

Method 3, the method used by the COBE/DMR team (Jansen & Gulkis 
1992), is the special case of Method 1 where M = N^^. It is straightforward 
to prove that it has the following three desirable properties: 

1. It minimizes = {y ~ Ax)*N~"'^(y — Ax). 

2. It minimizes subject to the constraint WA = I. 

3. It is the maximum-likelihood estimate of x if the probability distribu- 
tion for n is Gaussian. 

For this method, the noise covariance matrix in the map is 5] = (ee*) = 
[A*N-iA]-i. 

Method 4, known as Wiener filtering (Wiener 1949), can be derived in 
two ways (see e.g. Bunn et al. 1994, Zaroubi et al. 1995): 

1. It minimizes (|£p). 

2. It is the maximum posterior probability estimate of x in a Bayesean 
analysis if the probability distributions for n and x are Gaussian. 

It is stable even for "poorly connected" observations where [A*MA] is sin- 
gular or ill-conditioned. Although Method 5 looks different, it is in fact 
identical to Method 4. This can be proven using the same geometric series 
trick that is employed in equation (^) below. It is computationally prefer- 
able over Method 4 if the matrix to be inverted is smaller, i.e., if m < n. 
Method 6 lets the user choose a desired signal-to-noise ratio in the recon- 
structed map by means of the parameter rj, and was used in generating the 



4 



maps from the Saskatoon experiment (Tegmark et al. 1996). The COBE 
method clearly corresponds to the special case 77 — > 0. 

Wiener filtering generally gives less noisy maps at the price of suppress- 
ing the power in different pixels unequally. This is remedied by Method 7, 
which simply multiplies W by a diagonal matrix A (rescales each pixel) so 
that (WA)jj = 1 for all i. This method can also be derived by minimizing 
(|ep) subject to the constraint (WA),, = 1 (Tegmark &: Efstathiou 1996). 
In that paper, x did not denote a map but the CMB and foreground fluctu- 
ations in a given mode, but the mathematics is of course identical. Method 
8 simply combines the features of 6 and 7, and is relevant to the foreground 
problem (Tegmark Sz Efstathiou 1997). 

As mentioned above, Method 9 (the maximum posterior probability 
method) reduces to Wiener filtering when all probability distributions are 
Gaussian. When this is not the case, x is a non-linear function of y which 
must generally be determined numerically. A special case of this is Method 
10, the Maximum Entropy Method (MEM) (see e.g. Press et al. 1992; White 
&; Bunn 1995), which is also non- linear. Here the prior probability dis- 
tribution involves the entropy of the map, a measure of how smooth and 
featureless it is. 

3 WHICH METHODS DESTROY INFORMATION? 

Which of the above-mentioned map-making methods is preferable clearly 
depends on what the map is to be used for. However, if the map is to 
be used for constraining cosmological parameters, wc can make quite strong 
statements as to which methods are better and which are worse. Specifically, 
we will consider a method to be better than another if the map it produces 
allows the cosmological parameters to be measured with smaller error bars. 

3.1 The Fisher Information Matrix 

Let denote a vector consisting of the parameters we wish to estimate. For 
instance, Jungman et al. (1996) assess attainable accuracies by choosing 

= {n, n^, h, A, ns, r, tit, T/S, r, Q, iV^), (6) 

the density parameter, the baryon density, the Hubble parameter, the cos- 
mological constant, the spectral index of scalar fluctuations, the "running" of 
this index, the spectral index of tensor fluctuations, the quadrupole tensor- 
to-scalar ratio, the optical depth to reionization, and the number of light 
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neutrino species, respectively. As described in detail in Tegmark, Taylor &; 
Heavens (1997), the best possible unbiased estimates of these parameters 
will have a covariance matrix that is well approximated by F~^, where F is 
known as the Fisher Information Matrix. For the case where the data has a 
Gaussian distribution with zero mean and a covariance matrix C, F is given 
by ^ 

where 

Gj = C ^C,i (8) 

and the comma notation C,i is shorthand for dC/dOi. This means that if all 
parameters except Oi are known, the data set contains enough information to 
determine 9i with error bar A^j = 1/Fjj, whereas if wc need to determine all 
parameters jointly, wc can obtain AOj, = (F^^)j,,;. It is in this sense that F is 
a measure of how much information the data contains about the parameters, 
and loosely speaking, the larger F is, the better. 

3.2 The notion of a lossless map 

Since the time-ordered data (TOD) contains all the information we have, 
computing F directly from the TOD places a rock-bottom lower limit on 
the error bars we can hope to attain. Although these minimal error bars 
can generally be attained with a brute-force likelihood analysis of the TOD, 
this unfortTinately tends to be computationally unfeasible in practice, since 
even in the Gaussian case that wc arc considering, this involves repeated 
determinant calculations (essentially Cholesky decompositions) of n x n ma- 
trices. For COBE, we had n ^ 2 x 10^, as compared to m = 6144. This is 
why map-making is such a useful intermediate step, reducing the data set 
to a more manageable size. By computing F from the map, we can assess 
the effectiveness of the map-making method. If F™^P = F^°*^, the map is 
lossless in the sense that it contains all the cosmological information that 
the TOD did, in a distilled form. Conversely, if F™^P F*°d, some useful 
information has been destroyed in the map-making process. 

Are any of the above-mentioned methods lossless? First of all, note that 
F remains unchanged if we multiply our data set by an invertible matrix 
B: if the new data set is x' = Bx, then C = BCB*, = C'~^C\i = 
B~*GjB*, and F' = F. This is simple to understand intuitively: x' must 
clearly contain the same information that x does, since x can be computed 
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from x'. This elementary observation immediately tells us that methods 
3-8 are are information-theoretically equivalent, giving the same F, since 
each of these six W-matrices can be obtained from each of the other by 
multiplying by some invertible matrix from the left. For instance, we can 
compute a Wiener-filtered map x' from a COBE map x by multiplying it by 
B = [S"i + A*N-iA]-i[A*N-iA] as was done by Bunn et al. (1994) and 
Bunn et al. (1996). 

3.3 A proof that methods 3-8 lose no information 

We will now compute the Fisher matrices F™^P from the maps made with 
methods 3-8. As mentioned above, they are all identical, and do not change 
if we multiply W from the left by an arbitrary invertible matrix. Let us 
take advantage of this by making the simple choice W = A*N^^ in our 
calculation (for instance, method 3 can be put in this form by multiplying 
its W by = [A*N-iA]). This gives 

Qmap ^ ^ A*N-^ASA*N-^A + A^N'^NN"^ A 

= S-i[I + SS-i], (9) 

C^ap ^ s-is,i5]-\ (10) 

G™^P = [I + SI]-^]-^S,iA*N~^A. (11) 

For the time-ordered data, the corresponding expressions are 

C*°^ = (yy*) = ASA* + N, (12) 
C,*°d = AS,, A*, (13) 
Qtod ^ [ASA^ + N]-^ AS,, A* = [I + N"^ ASA*]-^N-^AS,i A*(14) 

Since matrices of the form [I + M]^^ can be expanded as a geometric series 
I - M + - + we obtain 

Qtod ^ [I-N-iASA* + N-iASA*N-iASA*-...]N-iAS,iA* 

= N-^A[I - SA*N-^A + SA*N~^ASA*N-^A - ...]S,i A* 

= N-^A[I + SA*N-^A]-^S,i A* 

= N-^A[I + SE-^]-^S,iA*. (15) 



Comparing equations ( |Tl| ) and ([l5|), we see that the matrices G™^^ and 
G''°"^ differ only by a cyclic permutation, moving the factor N^^A from 
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one side to the other. Since a trace of a product of matrices is invariant 
under cyclic permutations, we obtain our desired result: 

F^^P = itr G^^^^PQf = itr 0*0^0^^ = ptod, ^^g^ 

In other words, methods 3-8 are all lossless, regardless of what parameters 
we choose to estimate. 

4 CONCLUSIONS 

We have compared ten methods for making maps from CMB data. We 
found that for the Gaussian case, both the COBE method and assorted 
variants of Wiener filtering are optimal in the sense that they retain all the 
cosmological information that was present in the time-ordered data. The 
choice between them is mainly one of numerical convenience, since these six 
maps (and indeed any lossless maps) can all be computed from one another 
without going back to the TOD. The linear methods 1 and 2, on the other 
hand, destroy information whenever they differ from Method 3, i.e., unless 
M = N"-*^ in method 1 or N oc I in Method 2. Among other things, this 
means that in the presence of l//-noisc, we should not simply average the 
observation in each pixel, since we can do better. The non-linear methods 
9 and 10 also destroy information unless they can be inverted to reproduce 
say map 3, the map from the COBE method. 

Our proof that methods 3-8 are lossless was strictly valid only if both 
signal and noise are Gaussian. However, as long as the noise in the TOD is 
Gaussian (after appropriate removal of glitches, known systematics etc.), the 
same results hold even if the sky pattern is non-Gaussian. Letting /a;(x; 0) 
denote the (not necessarily Gaussian) probability distribution for the map 
X, the likelihood function for the parameter vector is 

L(0) = J U{y - Ax)/,(x; 0)d"*x, (17) 

where fn is the Gaussian noise probability distribution /^(n) oc exp[— n*N~^n/2]. 
Proportionality constants that are independent of are of course irrelevant 
in a likelihood analysis, so since 

L(0) = e-^y*N-Vye-|x*A*N-i[Ax-2y]^^^^.0)^m^ 

OC j e-5-*^-M-2x];^(x;0)crx, 



(18) 
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where x = 5]A*N~^y is the map made with Method 3 and E = (A*N~^ A)~^ 
as before, we see that our likeHhood function depends on the the data y only 
indirectly, via ic. In other words, we can compute the full TOD likelihood 
function directly from the map made with the COBE method. This shows 
that even if the CMB fluctuations are non-Gaussian, Method 3 (and con- 
sequently also 4-8) are lossless, so that we will get the strongest possible 
constraints on cosmological models by splitting the data processing into two 
steps, as in Figure 1: 

1. Use one of the simple linear methods 3-8 to compress the TOD into a 
map. 

2. Use this map as the starting point for any non-linear data processing 
(for removing point sources, for detecting topological defects, etc.). 

The fact that Methods 9 and 10 destroy information is of course not an 
argument against nonlinearly processed maps per se even in the Gaussian 
case, since maps have other uses than parameter estimation. The point 
is simply that these methods are inferior (slower and not lossless) in the 
process of data compression from TOD to a map, so if one wants for instance 
a maximum entropy map from a huge data set, it is better to split the data 
processing into the above-mentioned two steps. 

This is quite good news for the CMB community, since it has recently 
been demonstrated (Wright et al. 1996; Wright 1996) that clever algorithms 
make it numerically feasible to make maps with the COBE method (Method 
3) even when millions of pixels are involved. This makes it the natural 
choice as the first step in the data compression pipeline, since the other 
lossless methods can be computed directly from this map if desired, without 
using the TOD. Two additional desirable properties of the COBE method 
reenforcc this conclusion: 

• It is independent of S, i.e., of cosmological model assumptions. 

• With a well chosen observational strategy, the covariance matrix 5] of 
the map is approximately diagonal (Wright 1996), simplifying subse- 
quent analysis. 

In conclusion, although much work remains to be done on other aspects 
of CMB data analysis, the map-making problem now appears to be under 
control, since we are armed with methods that are both optimal and feasible. 

Support for this work was provided by NASA through a Hubble Fellow- 
ship, #HF-01084.01-96A, awarded by the Space Telescope Science Institute, 
which is operated by AURA, Inc. under NASA contract NAS5-26555. 



9 



5 REFERENCES 



Bennett, C. L. 1996, ApJ, 464, LI. 

Bunn, E. F. et al. 1994, ApJ, 432, L75. 

Bunn, E. F., Hoffman, Y & Silk, J 1996, ApJ, 464, 1. 

Dodelson, S. 1996, preprint |astro-ph/ 95 120211 . 

Hinshaw, G. et al. 1996, ApJL, 464, L17. 

Jansen, D. J. & Gulkis, S. 1992, "Mapping the Sky With the COBE-DMR", in 
"The Infrared and Submillimeter Sky after COBE", eds. M. Signore & C. 
Dupraz (Dordrecht:Kluwer). 

Jungman, G.. Kamionkowski, M., Kosowsky, A & Spergel, D. N. 1996, Phys. 
Rev. D, 54, 1332. 



Knox, L. 1996, preprint astro-ph / 9606066 



Press, W. H., Flannery, B. P., Teukolski, S. A. & Vetterhng, W. T. 1992, Numer- 
ical Recipes, 2nd ed. (New York, Cambridge Univ. Press). 
Smoot, G. F. et al. 1992, ApJ, 396, LI. 
Tegmark, M. & Bunn, E. F. 1995, ApJ, 455, 1. 

Tegmark, M., de Oliveira-Costa, A., Devlin, M. J., Netterfield, C. B, Page, L. & 

Wollack, E. J. 1996, ApJL, 474, L77. 
Tegmark, M. & Efstathiou, G. 1996, MNRAS, 281, 1297. 
Tegmark, M. & Efstathiou, G. 1997, in preparation. 



Tegmark, M., Taylor, A. k Heavens, A. F. 1997, preprint |astro-ph/9603021 
White, M. & Bunn, E. F. 1995, ApJ, 443, L53. 

Wiener, N. 1949, Extrapolation and Smoothing of Stationary Time Series (NY: 

Wiley). 

Wright, E. L. 1996, preprint |astro-ph/96T200e . 

Wright, E. L., Hinshaw, G. & Bennett, C. L. 1996, ApJL, 458, L53. 
Zaroubi, S. et al. 1995, ApJ, 449, 446. 



10 



TIME- 
ORDERED 
DATA 



SKY 
MAP 



Pixel 1 


Pixel 2 


AT 




6422347 


6443428 


-454 


841 


3141592 


2718281 


141 


421 


8454543 


9345593 


654 


766 


1004356 


8345388 


-305 


567 




w 




PARAMETER 
ESTIMATES 



Q., t2b, A, h 
n, riT, Q, T/S 



Figure 1: Map-making as an intermediate step in measuring cosmological 
parameters. If F™^P = F^°^, then the map-making method W is lossless, 
which means that parameter estimation based on the map gives just as smaU 
error bars as using all the time-ordered data. 
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